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Chapter 1 

Probability Distributions 

1.1 Aims and Motivation for the Course 1 

We aim to: 



• Develop a theory which can characterize the behavior of real-world Random Signals and Pro- 
cesses; 

• Use standard Probability Theory for this. 

Random signal theory is important for 

• Analysis of signals; 

• Inference of underlying system parameters from noisy observed data; 

• Design of optimal systems (digital and analogue signal recovery, signal classification, estimation ...); 

• Predicting system performance (error-rates, signal-to-noise ratios, ...). 

Example 1.1: Speech signals 

Use probability theory to characterize that some sequences of vowels and consonants are more 
likely than others, some waveforms more likely than others for a given vowel or consonant. Please 
see Figure 1.1. 

Use this to achieve: speech recognition, speech coding, speech enhancement, ... 



1 This content is available online at <http://cnx.Org/content/ml0983/2.4/>. 
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Figure 1.1: Four utterances of the vowel sound 'Aah'. 
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Example 1.2: Digital communications 

Characterize the properties of the digital data source (mobile phone, digital television transmitter, 
...), characterize the noise/distortions present in the transmission channel. Please see Figure 1.2. 

Use this to achieve: accurate regeneration of the digital signal at the receiver, analysis of the 
channel characteristics ... 
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Figure 1.2: Digital data stream from a noisy communications Channel. 



Probability theory is used to give a mathematical description of the behavior of real- world systems which 
involve elements of randomness. Such a system might be as simple as a coin-flipping experiment, in which 
we are interested in whether 'Heads' or 'Tails' is the outcome, or it might be more complex, as in the study 
of random errors in a coded digital data stream (e.g. a CD recording or a digital mobile phone). 

The basics of probability theory should be familiar from the IB Probability and Statistics course. Here 
we summarize the main results from that course and develop them into a framework that can encompass 
random signals and processes. 

1.2 Probability Distributions 2 



The distribution Px of a random variable X is simply a probability measure which assigns probabilities to 
events on the real line. The distribution Px answers questions of the form: 

What is the probability that X lies in some subset F of the real line? 

In practice we summarize Px by its Probability Mass Function - pmf (for discrete variables only), 
Probability Density Function - pdf (mainly for continuous variables), or Cumulative Distribution 
Function - cdf (for either discrete or continuous variables). 



2 This content is available online at <http://cnx.Org/content/ml0984/2.8/>. 



4 CHAPTER 1. PROBABILITY DISTRIBUTIONS 

1.2.1 Probability Mass Function (pmf) 

Suppose the discrete random variable X can take a set of M real values {xi, . . . ,%}, then the pmf is 
defined as: 

p x (Xi) = Pr\X = Xi] 

= Px ({Xi}) 

where J2i=iPx {%i) = 1- e.g. For a normal 6-sided die, M = 6 and px (xi) = g. For a pair of dice being 
thrown, M = 11 and the pmf is as shown in (a) of Figure 1.3. 
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Figure 1.3: Examples of pmfs, cdfs and pdfs: (a) to (c) for a discrete process, the sum of two dice; (d) 
and (e) for a continuous process with a normal or Gaussian distribution, whose mean = 2 and variance 
= 3. 
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1.2.2 Cumulative Distribution Function (cdf) 

The cdf can describe discrete, continuous or mixed distributions of X and is defined as: 

F x (x) = Pr[X<x] 

= P X ((-<X),X]) 

For discrete X: 

F x (x) = J2{Px(xi)\xi<x} (1.3) 

i 

giving step- like cdfs as in the example of (b) of Figure 1.3. 
Properties follow directly from the Axioms of Probability: 

1. < F x 0) < 1 

2. F x (-oo) = 0, F x {oo) = l 

3. Fx (x) is non-decreasing as x increases 

4. Pr [ Xl <X< x 2 ] = F x (x 2 ) - F x (sci) 

5. Pr[X>x] = l- F x (x) 

where there is no ambiguity we will often drop the subscript X and refer to the cdf as F (x). 

1.2.3 Probability Density Function (pdf) 

The pdf of X is defined as the derivative of the cdf: 

fx (x) = ^F X (x) (1.4) 

The pdf can also be interpreted in derivative form as 5 (x) —> 0: 

fx(x)5(x) = Pr[x<X<x + 6(x)} 

= F x (x + 6 (x)) - F x (x) 

For a discrete random variable with pmf given by px {xi): 

M 

fx (x) = y^ J Px (xj) 6(x- Xi) (1.6) 

An example of the pdf of the 2-dice discrete random process is shown in (c) of Figure 1.3. (Strictly the 
delta functions should extend vertically to infinity, but we show them only reaching the values of their areas, 

Px (Xi).) 

The pdf and cdf of a continuous distribution (in this case the normal or Gaussian distribution) are 
shown in (d) and (e) of Figure 1.3. 

note: The cdf is the integral of the pdf and should always go from zero to unity for a valid 
probability distribution. 

Properties of pdfs: 

1- fx (x) > 

2- ,C cc fx(x)dx = l 

3. F X (x) = J^ f x (a) da 

4. Pr [ Xl <X< x 2 ] = Q f x (a) da 

As for the cdf, we will often drop the subscript X and refer simply to / (x) when no confusion can arise. 



1.3 Conditional Probabilities and Bayes' Rule 3 

If A and B are two separate but possibly dependent random events, then: 

1. Probability of A and B occurring together = Pr [A, B] 

2. The conditional probability of A, given that B occurs = Pr [A \ B] 

3. The conditional probability of B, given that A occurs = Pr [B \ A] 

From elementary rules of probability (Venn diagrams) : 

Pr\A,B] = Pr[A \ B)Pr[B] 



(1.7) 

= Pr [B | A] Pr [A] 

Dividing the right-hand pair of expressions by Pr [B] gives Bayes' rule: 

, , , , Pr\B I A]Pr\A] , N 

p r[A \B}= L p ^ L J (1.8) 

In problems of probabilistic inference, we are often trying to estimate the most probable underlying model for 
a random process, based on some observed data or evidence. If A represents a given set of model parameters, 
and B represents the set of observed data values, then the terms in (1.8) are given the following terminology: 

• Pr [A] is the prior probability of the model A (in the absence of any evidence); 

• Pr [B] is the probability of the evidence B; 

• Pr [B \ A] is the likelihood that the evidence B was produced, given that the model was A; 

• Pr [A | B] is the posterior probability of the model being A, given that the evidence is B. 

Quite often, we try to find the model A which maximizes the posterior Pr [A \ B]. This is known as 
maximum a posteriori or MAP model selection. 

The following example illustrates the concepts of Bayesian model selection. 

Example 1.3: Loaded Dice 

Problem: 

Given a tub containing 100 six-sided dice, in which one die is known to be loaded towards the 
six to a specified extent, derive an expression for the probability that, after a given set of throws, 
an arbitrarily chosen die is the loaded one? Assume the other 99 dice are all fair (not loaded in any 
way). The loaded die is known to have the following pmf: 

PL (1) = 0.05 

{PL (2) ,...,Pi (5)} = 0.15 

PL (6) = 0.35 

Here derive a good strategy for finding the loaded die from the tub. 
Solution: 
The pmfs of the fair dice may be assumed to be: 

PF{i)= - , i = {1,...,6} 
6 



3 This content is available online at <http://cnx.Org/content/ml0985/2.8/>. 
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Let each die have one of two states, S = L if it is loaded and S = F if it is fair. These are our two 
possible models for the random process and they have underlying pmfs given by {p^ (1) , . . . ,Pl (6)} 
and {p F (1) , . . . ,p F (6)} respectively. 

After N throws of the chosen die, let the sequence of throws be Qn = {#i, • ■ • , 8n}, where each 
9i g {1, . . . , 6}. This is our evidence. 

We shall now calculate the probability that this die is the loaded one. We therefore wish to find 
the posterior Pr [S = L | 0jv]- 

We cannot evaluate this directly, but we can evaluate the likelihoods, Pr [&n | S = L] and 
Pr [0jv | S = F], since we know the expected pmfs in each case. We also know the prior proba- 
bilities Pr [S = L] and Pr [S = F] before we have carried out any throws, and these are {0.01, 0.99} 
since only one die in the tub of 100 is loaded. Hence we can use Bayes' rule: 

fr , 5 _ 1|eK| _ iMe„|s-L]JMS-4 (L9) 

Pr [@ N \ 

The denominator term Pr [©at] is there to ensure that Pr [S = L | 0jv] and Pr [S = F | ©at] 
sum to unity (as they must). It can most easily be calculated from: 



so that 



where 



Pr[Q N } = Pr[Q N ,S = L] + Pr[Q N ,S = F] 

= Pr[@ N \S=L}Pr[S = L} + Pr[Q N \S = F}Pr[S=F] 



p r \q_r I p, 1 __ Pr[e N | S=L]Pr[S=L] 

r-l[Q-_U |Ujvj - Pr [Q N I S=L]Pr[S=L]+Pr[0 N | S=F]Pr[S=F] 

_ 1 

1+R N 



(1.10) 



(1.11) 



Pr[Q N \S = F}Pr[S = F] 
Rn ~ Pr[Q N \S=L]Pr[S = L] (L12) 

To calculate the likelihoods, Pr [0jy \ S = L] and Pr [0jy | S = F], we simply take the product 
of the probabilities of each throw occurring in the sequence of throws 0jv, given each of the two 
modules respectively (since each new throw is independent of all previous throws, given the model). 
So, after N throws, these likelihoods will be given by: 

N 

Pr[Q N \S = L] = Y[p L (9i) (1.13) 

i=i 

and 

TV 

Pr[Q N \S = F] = Hp F (9i) (1.14) 

j=i 

We can now substitute these probabilities into the above expression for Rn and include 
Pr [S = L] = 0.01 and Pr [S = F] = 0.99 to get the desired a posteriori probability 
Pr[S = L | 0^] after N throws using (1.11). 

We may calculate this iteratively by noting that 

Pr[Q N \S = L] = Pr[e N _ 1 \S = L]p L (9 n ) (1.15) 

and 

Pr[Q N \S=F} = Pr[Q N _ 1 \S = F}p F (6 n ) (1.16) 

so that 

Rn - R-N-i— -jty U-i'J 



where Rq = p r \ s=L \ = 99- If we calculate this after every throw of the current die being 
tested (i.e. as N increases), then we can either move on to test the next die from the tub if 
Pr [S = L | 0jv] becomes sufficiently small (say < (l0~ 4 )) or accept the current die as the loaded 
one when Pr[S = L | Ojv] becomes large enough (say > (0.995)). (These thresholds correspond 
approximately to Rn > 10 4 and Rn < 5 x 10~ 3 respectively.) 

The choice of these thresholds for Pr [S = L | 0^] is a function of the desired tradeoff between 
speed of searching versus the probability of failure to find the loaded die, either by moving on to 
the next die even when the current one is loaded, or by selecting a fair die as the loaded one. 

The lower threshold, p\ = 10~ 4 , is the more critical, because it affects how long we spend before 
discarding each fair die. The probability of correctly detecting all the fair dice before the loaded 
die is reached is (1 — pi) n ~ 1 — np\, where n ~ 50 is the expected number of fair dice tested before 
the loaded one is found. So the failure probability due to incorrectly assuming the loaded die to be 
fair is approximately np\ ~ 0.005. 

The upper threshold, p2 = 0.995, is much less critical on search speed, since the loaded 
result only occurs once, so it is a good idea to set it very close to unity. The failure prob- 
ability caused by selecting a fair die to be the loaded one is just 1 — p 2 = 0.005. Hence the 
overall failure probability = 0.005 + 0.005 = 0.01 

note: In problems with significant amounts of evidence (e.g. large N), the evidence probability 
and the likelihoods can both get very very small, sufficient to cause floating-point underflow on 
many computers if equations such as (1.13) and (1.14) are computed directly. However the ratio 
of likelihood to evidence probability still remains a reasonable size and is an important quantity 
which must be calculated correctly. 

One solution to this problem is to compute only the ratio of likelihoods, as in (1.17). A more 
generally useful solution is to compute log(likelihoods) instead. The product operations in the 
expressions for the likelihoods then become sums of logarithms. Even the calculation of likelihood 
ratios such as Rn and comparison with appropriate thresholds can be done in the log domain. 
After this, it is OK to return to the linear domain if necessary since Rjy should be a reasonable 
value as it is the ratio of very small quantities. 



10 
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Figure 1.4: Probabilities of the current die being the loaded one as the throws progress (20th die is 
the loaded one). A new die is selected whenever the probability falls below p\. 



11 



Surface plot of histograms of throws 




BOO 



value of throw 



no. of throws 
Image plot of histograms of throws 



m 2 



UU\ l^B r r t ft Ml tt Mr i Y MU\M- 

I I I II ^1 II 



100 



i'O 



300 400 

no. of throws 



500 



800 



700 



Figure 1.5: Histograms of the dice throws as the throws progress. Histograms are reset when each 
new die is selected. 
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1.4 Joint and Conditional cdfs and pdfs 4 

1.4.1 Cumulative distribution functions 

We define the joint cdf to be 

F(x,y) = Pr[{X < x) A (Y<y)] (1.18) 

and conditional cdf to be 

F(x\y) = Pr[X < x | Y < y] (1.19) 

Hence we get the following rules: 
• Conditional probability (cdf): 



F(x\y) = Pr[X <x \Y < y] 
• Bayes Rule (cdf): 



F Y (y) 



(1.20) 



, , N F(y\x)F(x) 
F(x\y)= p' {y) y ' (1-21) 

• Total probability (cdf): 

F (x, oo) = F (x) (1.22) 

which follows because the event Y < oo itself forms a partition of the sample space. 
Conditional cdfs have similar properties to standard cdfs, i.e. 

F X{Y {-oo\y) = 

Fx\r(oo\y) = 1 

1.4.2 Probability density functions 

We define joint and conditional pdfs in terms of corresponding cdfs. The joint pad is defined to be 

„, x d 2 F(x,y) 

and the conditional pdf is defined to be 

~ dF(x\Y=y) 

f(x\y) = ^ (1.24) 

where 

F' (x\Y = y) = Pr[X <x \Y = y] 

Note that F' (x| Y = y) is different from the conditional cdf F (x|Y = y), previously defined, but there is a 
slight problem. The event, Y = y, has zero probability for continuous random variables, hence probability 
conditional on Y = y is not directly defined and F' (x\Y = y) cannot be found by direct application of 
event-based probability. However all is OK if we consider it as a limiting case: 

F'(x\Y = y) = limit Pr[X < x | y < Y < y + 6 (y)] 

S(l/)-*0 



limit F(x,y+6(y))-F(x,y) - „-, 

8(y)to *V(l/+*G/))-*V(l0 ^- Zi) > 



BF(x,y) 

9y 

f Y (y) 



4 This content is available online at <http://cnx.Org/content/ml0986/2.8/>. 
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Joint and conditional pdfs have similar properties and interpretation to ordinary pdfs: 

f(x,y)>0 

f (x, y) dxdy = 1 

f(x\y)>0 

f{x\y)dx= 1 

note: From now on interpret J as J_ unless otherwise stated. 
For pdfs we get the following rules: 

• Conditional pdf: 

f (r.UA = - 

fiu) 

• Bayes Rule (pdf) 

• Total Probability (pdf): 



f(x\y) = ^4- (1-26) 



f{Av)= nV ^{ {X) (1-27) 



! f {y\x) f {x) dx = Jf(y,x)dx 

= f(y)Jf(x\y)dx (1.28) 

= f(y) 

The final result is often referred to as the Marginalisation Integral and / (y) as the Marginal 
Probability. 
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Chapter 2 

Random Vectors, Signals and Functions 



2.1 Random Vectors 1 



Random Vectors are simply groups of random variables, arranged as vectors. E.g.: 

T 



x=(x x ... x n ) 



(2.1) 



where X\, ... X n are n separate random variables. 

In general, all of the previous results can be applied to random vectors as well as to random scalars, but 
vectors allow some interesting new results too. 



(a) pdf of 2-D normal distribution: mean = D, var = 1 



(b) pdf of Rayleigh distribution: var = 2 
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Figure 2.1: pdfs of (a) a 2-D normal distribution and (b) a Rayleigh distribution, corresponding to 
the magnitude of the 2-D random vectors. 



1 This content is available online at <http://cnx.Org/content/ml0988/2.5/>. 
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2.1.1 Example - Arrows on a target 

Suppose that arrows are shot at a target and land at random distances from the target centre. The horizontal 
and vertical components of these distances are formed into a 2-D random error vector. If each component of 
this error vector is an independent variable with zero- mean Gaussian pdf of variance a 2 , calculate the pdf's 
of the radial magnitude and the phase angle of the error vector. 
Let the error vector be 

X=( X x X 2 Y (2.2) 



Xi and X 2 each have a zero-mean Gaussian pdf given by 



f(x) = -jL=e-£* (2.3) 



Since X\ and X 2 are independent, the 2-D pdf of X is 

fx(xi,x 2 ) = f{x 1 )f(x 2 ) 

27T<T 2 

In polar coordinates 

x\ = rcos (9) 

and 

x 2 = rsm (9) 

To calculate the radial pdf, we substitute r = \Zx\ 2 + x 2 2 in the above 2-D pdf to get: 

rr-\-5(r) /*7T 



where 

r+5(r) r ir f n 1 r 2 \ 



fx(xi,x 2 )RdOdR~ 5{r) ^e ^ 2 r d6 = -^re ^ 5 (r) 



— 7T 



(2.4) 



Pr[r<R<r + 6(r)]= / fx(xi,x 2 ) RdOdR (2.5) 

J r J — it 



Hence the radial pdf of the error vector is: 

limit Pr\r<R<r+6(r)] 
f i \ _ a(r)-.o K _^_ 

jR[ r ) ~ S(r) (2.6) 

1 — -^ 

<j 2 

This is a Rayleigh distribution with variance = 2cr 2 (these are two components of X, each with variance 
a 2 ). 

The 2-D pdf of X depends only on r and not on 9, so the angular pdf of the error vector is constant over 
any 2n interval and is therefore 

fe (0) = £- 

Air 
so that 

fe (0) dd = l 



17 

2.2 Random Signals 2 

Random signals are random variables which evolve, often with time (e.g. audio noise), but also with distance 
(e.g. intensity in an image of a random texture), or sometimes another parameter. 

They can be described as usual by their cdf and either their pmf (if the amplitude is discrete, as in a 
digitized signal) or their pdf (if the amplitude is continuous, as in most analogue signals). 

However a very important additional property is how rapidly a random signal fluctuates. Clearly a slowly 
varying signal such as the waves in an ocean is very different from a rapidly varying signal such as vibrations 
in a vehicle. We will see later in Section 2.3 how to deal with these frequency dependent characteristics of 
randomness. 

For the moment we shall assume that random signals are sampled at regular intervals and that each 
signal is equivalent to a sequence of samples of a given random process, as in the following examples. 



2 This content is available online at <http://cnx.Org/content/ml0989/2.5/>. 
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£c) Filtered noise 




(d) Signal + noise at the detector 




Figure 2.2: Detection of signals in noise: (a) the transmitted binary signal; (b) the binary signal after 
filtering with a half-sine receiver filter; (c) the channel noise after filtering with the same filter; (d) the 
filtered signal plus noise at the detector in the receiver. 
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Figure 2.3: The pdfs of the signal plus noise at the detector for the two ± (1). The vertical dashed 
line is the detector threshold and the shaded area to the left of the origin represents the probability of 
error when data = 1. 



2.2.1 Example - Detection of a binary signal in noise 

We now consider the example of detecting a binary signal after it has passed through a channel which adds 
noise. The transmitted signal is typically as shown in (a) of Figure 2.2. 

In order to reduce the channel noise, the receiver will include a lowpass filter. The aim of the filter is 
to reduce the noise as much as possible without reducing the peak values of the signal significantly. A good 
filter for this has a half-sine impulse response of the form: 



hit) 



-sin 



3,: ."(«) 
otherwise 



if < t < T h 



(2.7) 



Where Tf, = bit period. 

This filter will convert the rectangular data bits into sinusoidally shaped pulses as shown in (b) of 
Figure 2.2 and it will also convert wide bandwidth channel noise into the form shown in (c) of Figure 2.2. 
Bandlimited noise of this form will usually have an approximately Gaussian pdf. 

Because this filter has an impulse response limited to just one bit period and has unit gain at zero 
frequency (the area under h(t) is unity), the signal values at the center of each bit period at the detector 
will still be ± (1). If we choose to sample each bit at the detector at this optimal mid point, the pdfs of the 
signal plus noise at the detector will be shown in Figure 2.3. 

Let the filtered data signal be D (t) and the filtered noise be U (t), then the detector signal is 



R(t) =D(t) + U(t) 



(21 



If we assume that + (1) and —1 bits are equiprobable and the noise is a symmetric zero-mean process, the 
optimum detector threshold is clearly midway between these two states, i.e. at zero. The probability of error 
when the data = + (1) is then given by: 



Pr [error \ D = + (1)] 



Pr[R(t) < 
Fu(-l) 
/Too fu (u) du 



D 



(1)] 



(2.9) 
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where Fjj and fu are the cdf and pdf of U. This is the shaded area in Figure 2.3. 
Similarly the probability of error when the data = — 1 is then given by: 

Pr [error | D = -1] = Pr[R(t)>0 | D = -1] 

= 1-Fu(+(1)) (2.10) 

= /i°° fu («) du 
Hence the overall probability of error is: 



(2.11) 



Pr [error] = Pr [error | D = + (1)] Pr [D = + (1)] + Pr [error | D = -1] Pr [D = -1] 
= /.Ti /a («) rf«Pr [£> = + (1)] + /r fu («) d^Pr [D = -1] 
since /jy is symmetric about zero 

/CO />OG 

/t/ (u) du {Pr [D = + (1)] + Pr[D = -1]) = fa (u) du 

To be a little more general and to account for signal attenuation over the channel, we shall assume that the 
signal values at the detector are ± (vo) (rather than ± (1)) and that the filtered noise at the detector has a 
zero-mean Gaussian pdf with variance a 2 : 

fu(u) = —l=e-£ (2.12) 



V2^ 2 

and so 

Pr [error] = /~ fu (u) du 

= f^fu{vu)adu (2-13) 

= Q{f) 

where 

Q{x) = -j=\ e-^du (2.14) 

V 2tt J x 

This integral has no analytic solution, but a good approximation to it exists and is discussed in some detail 
in Section 2.3. 

From (2.13) we may obtain the probability of error in the binary detector, which is often expressed as 
the bit error rate or BER. For example, if Pr [error] = 2 x 10 3 , this would often be expressed as a bit 
error rate of 2 x 10 3 , or alternatively as 1 error in 500 bits (on average). 

The argument ( ^-) in (2.13) is the signal-to-noise voltage ratio (SNR) at the detector, and the BER 
rapidly diminishes with increasing SNR (see Figure 2.4). 

2.3 Approximation Formulae for the Gaussian Error Integral, Q(x) 3 

A Gaussian pdf with unit variance is given by: 

f(x) = -±=e-£ (2.15) 

\l lit 

The probability that a signal with a pdf given by / (x) lies above a given threshold x is given by the Gaussian 
Error Integral or Q function: 

f< oo 

Q(x)= / f{u)du (2.16) 



3 This content is available online at <http://cnx.Org/content/mll067/2.4/>. 



21 

There is no analytical solution to this integral, but it has a simple relationship to the error function, erf (x), 
or its complement, erfc(:r), which are tabulated in many books of mathematical tables. 

2 f' x 
erf (a) = -= / e' u du (2.17) 

V* Jo 

and 

erfc (x) = 1 — erf (x) 

y ' V ' 2 (2.18) 

= 2 roc » 

^/7r J a: 

Therefore, 

Note that erf (0) = and erf (oo) = 1, and therefore Q (0) = 0.5 and Q (x) — > very rapidly as x becomes 
large. 

It is useful to derive simple approximations to Q (x) which can be used on a calculator and avoid the 
need for tables. 

Let v = u — x, then: 

Q ( x ) = Jo°° f( v + x ) dv 

1 POO v 2 +2vx + x 2 

= vfe^o e 5 dv (2-20) 

_S^ 2 

e 2 roo ( ra ) 5L_ , 

Now if x S> 1, we may obtain an approximate solution by replacing the e~^~ term in the integral by unity, 
since it will initially decay much slower than the e~^ x ' term. Therefore 

X 2 T 2 

— ^- POO — ^- 

Q {x) < e —L / e-<- vx >dv = ^-=^- (2.21) 

V27T Jo \f2lTX 

This approximation is an upper bound, and its ratio to the true value of Q (x) becomes less than 1.1 only 
when x > 3, as shown in Figure 2.4. We may obtain a much better approximation to Q (x) by altering the 



denominator above from ( \Z2ttx) to ( 1.64a; + \/0.76a; 2 + 4) to give: 

a; 2 

Q (x) ^ € , 2 (2.22) 

1.64x + V0.76a; 2 + 4 

This improved approximation gives a curve indistinguishable from Q (x) in Figure 2.4 and its ratio to the 
true Q (x) is now within ± (0.3%) of unity for all x > as shown in Figure 2.5. This accuracy is sufficient 
for nearly all practical problems. 



22 



CHAPTER 2. RANDOM VECTORS, SIGNALS AND FUNCTIONS 



10° 
ID" 1 
10* 

10" 



■ 


■ ( S> :r -■:■ 1 ::.-::*. -1 -I::, !::;::■ -1 ■ 





' 


^N^- ■ 


Qjxj^^^sinnple approximation 










■ 


^sjsi: 


■ 


r 




: 


^^j^ 


: : : : : i : :::::::::: : : : : : : 






^^^ 


i 


! J J Jim { .: : : ..: U-!£!J 


^v 












r.:: : 




^\ 


.... ..... 


" " tL" ■ 





















i i iii 


i ■ N 



10* r 



10 



0.5 1 15 



2 2.5 3 3.5 4 4.5 5 

X 



Figure 2.4: Q (x) and the simple approximation of (2.21) 
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Figure 2.5: The ration of the improved approximation of Q (x) in (2.22) to the true value, obtained 
by numerical integration. 
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Chapter 3 

Expectations, Moments, and 
Characteristic Functions 

3.1 Expectation 1 

Expectations form a fundamental part of random signal theory. In simple terms the Expectation Operator 
calculates the mean of a random quantity although the concept turns out to be much more general and 
useful than just this. 

If X has pdf fx (x) (correctly normalised so that J_ fx (x) dx = 1), its expectation is given by: 



(3.1) 



E[X] = j 00 oo xf x {x)dx 
= X 
For discrete processes, we substitute this previous equation (1.6) in here to get 

E l X ] = J™ cc x Y,t'LiPx{xi)S{x- Xi)dx 

= E i= i XiPx (x,) (3.2) 

= X 

Now, what is the mean value of some function, Y = g (X)l 

Using the result of this previous equation 2 for pdfs of related processes Y and X: 

f Y (y)d(y) = f x (x)d(x) (3.3) 

Hence (again assuming infinite integral limits unless stated otherwise) 

E[g(X)} = E[Y] 

= Jyf Y (y)dy (3.4) 

= / 9 (x) fx (x) dx 

This is an important result which allows us to use the Expectation Operator for many purposes including 
the calculation of moments and other related parameters of a random process. 
Note, expectation is a Linear Operator: 

E [a 9l (X) + bg 2 (X)} = aE [ 9l (X)} + bE [g 2 (X)} (3.5) 



1 This content is available online at <http://cnx.Org/content/mll068/2.4/>. 
2 "Functions of Random Variables", (2) <http://cnx.Org/content/mll066/latest/#eql9> 
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FUNCTIONS 



3.2 Important Examples of Expectation 3 

We get Moments of a pdf by setting g (X) = X n in this previous equation (3.4), 
nth order moment 

E [X n ] = f x n f x (x) dx 



(3.6) 



• n = 1: 1st order moment, E [x] = Mean value 

• n = 2: 2nd order moment, E \x 2 ] = Mean-squared value (Power or energy) 

• n > 2: Higher order moments, E [x n ], give more detail about fx (x). 

3.2.1 Central Moments 

Central moments are moments about the centre or mean of a distribution, 
nth order central moment 



E 



X- X 



x- X) fx (x) dx 



Some important parameters from central moments of a pdf are: 
• Variance, n = 2: 

2" 



a 



E 



X- X 



J [ x ~ Xj fx{x)dx 

J x 2 f x (x) dx - 2 X J xf x (x) dx + [ X ) J fx (x) dx 



E [X 2 ] -2[X) +[X 



E [X 2 ] - X 



• Standard deviation, a = V variance. 

• Skewness, n = 3: 



E 



1 



X- X 



(3.7) 



(3.1 



(3.9) 



7 = if the pdf of X is symmetric about X, and becomes more positive if the tail of the distribution 

is heavier when X >X- 
• Kurtosis (or excess), n = 4: 



E 



X- X 



(3.10) 



k = for a Gaussian pdf and becomes more positive for distributions with heavier tails. 



note: Skewness and kurtosis are normalized by dividing the central moments by appropriate 
powers of a to make them dimensionless. Kurtosis is usually offset by —3 to make it zero for 
Gaussian pdfs. 



3 This content is available online at <http://cnx.Org/content/mll069/2.3/>. 
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3.2.2 Example: Central Moments of a Normal Distribution 

The normal (or Gaussian) pdf with zero mean is given by: 



fx(x) = -j=L=e-£ (3.11) 

What is the nth order central moment for the Gaussian? 

Since the mean is zero, the nth order central moment is given by 

E \X n ] = f x n f x (x) dx 

= i „ x n e 2» 2 dx 

fx (x) is a function of x 1 and therefore is symmetric about zero. So all the odd-order moments will integrate 
to zero (including the lst-order moment, giving zero mean). The even-order moments are then given by: 

2 r°° x 2 

E[X n ] = —^= x n e'^dx (3.13) 

v2ira 2 Jo 



where n is even. The integral is calculated by substituting u = -^ to give: 

J D °x n e~^dx = i(2cr 2 )^/ °°M^e- u dw 



2<7 2 

+ 1 

I. II. 2 p~ u dii. 

__, (3.14) 



i( 2 ^)^r(=±i) 



2 



Here T (z) is the Gamma function, which is defined as an integral for all real z > and is rather like the 
factorial function but generalized to allow non-integer arguments. Values of the Gamma function can be 
found in mathematical tables. It is defined as follows: 



z*oo 

, z — 1 „ — u , 



T(z)= / u z ~ v e~ u du (3.15) 

Jo 

and has the important (factorial-like) property that 

T(z + l)=zT(z) , z^O (3.16) 

F(z+l) = z\ , zeZA (z > 0) (3.17) 

The following results hold for the Gamma function (see below for a way to evaluate T (|) etc.): 

rQ)=V5F (3.18) 

r(i) = i (3.19) 

and hence _ 

r(|) = f (3,0, 

r(2) = l (3.21) 

Hence 

if n = odd 
E I" = { n 3.22 

1 ' x ' i ( 2(T 2 )^r(=±i) if n = even 
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• Valid pdf, n = 0: 



e[x°] = ^r(i) 



(3.23) 



as required for a valid pdf. 

note: The normalization factor 



V27TCT 2 



in the expression for the pdf of a unit variance 



Gaussian (e.g. (3.11)) arises directly from the above result. 
• Mean, n = 1: 



E[X]=0 



so the mean is zero. 
• Variance, n = 2: 



E 



X- X 



E[X 2 } 

^(2- 2 )r(|) 



(3.24) 



(3.25) 



Therefore standard deviation = \/ variance = a. 
• Skewness, n = 3: 



so the skewness is zero. 
• Kurtosis, n = 4: 



Hence 



E [X 3 ] = 



E 



X- X 



E[X 4 } 
3a 4 



r 2\ 2 3^F 



-E X-X 



3-3 





(3.26) 



(3.27) 



(3.28) 



3.2,3 Evaluation of the Gamma Function 

From the definition of T and substituting u = x 2 : 



r(!) 



L u 2 e "dw 

J— oo 



(3.29) 
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Using the following squaring trick to convert this to a 2-D integral in polar coordinates: 

T 2 G) = iZe-^dxJ^e-y'dy 



f°° r e-( x2+ y 2 )dxdy 

J -co J -co a 

f_J^e- r \drde (3.30) 

2 7 r(-|)e-'- 2 |§° 



and so (ignoring the negative square root): 



T ( - ) = V^- 1-7725 (3.31) 



Hence, using T (z + 1) = zT (z): 



,,'3579 ]\ (1 r- 3 r- 15 r- 105 _ 



The case for 2 = 1 is straightforward: 



r(l) = f™u°e- u du 



(3.33) 



so 

r({2, 3,4,5,. ..}) = {!, 2, 6,24,...} (3.34) 



3.3 Sums of Random Variables 4 

Consider the random variable Y formed as the sum of two independent random variables X\ and X 2 : 

Y = Xi + X 2 (3.35) 

where Xi has pdf f\ (x\j and X2 has pdf f 2 {x 2 ). 

We can write the joint pdf for y and x\ by rewriting the conditional probability formula: 

f(y,x 1 ) = f(y\x 1 )f 1 (x 1 ) (3.36) 

It is clear that the event 'Y takes the value y conditional upon X\ = x\ is equivalent to X 2 taking a value 
y — x\ (since X 2 = Y — X\). Hence 

/ (y|*i) = /2 (y - ari) (3-37) 

Now / (y) may be obtained using the Marginal Probability formula (this equation (1.28) from this 
discussion of probability density functions (Section 1.4.2: Probability density functions)). Hence 

f(y) = Jf(y\xi)fi(xi)€ki 

= Jf 2 (y-x 1 )f 1 (x 1 )dxl (3.38) 

= h * h 



4 This content is available online at <http://cnx.Org/content/mll070/2.3/>. 
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This result may be extended to sums of three or more random variables by repeated application of the above 
arguments for each new variable in turn. Since convolution is a commutative operation, for n independent 
variables we get: 

f(y) = /n * (/n-l * ■ ■ ■ * /2 * /l) (33Q 

= In* fn-1 * • • • * h * h 

An example of this effect occurs when multiple dice are thrown and the scores are added together. In the 
2-dice example of the subfigures a,b,c of this figure (Figure 1.3) in the discussion of probability distributions, 
we saw how the pmf approximated a triangular shape. This is just the convolution of two uniform 6-point 
pmfs for each of the two dice. 

Similarly if two variables with Gaussian pdfs are added together, we shall show in the discussion (Sec- 
tion 3.4.2: Summation of two or more Gaussian random variables) of the summation of two or more Gaussian 
random variables that this produces another Gaussian pdf whose variance is the sum of the two input vari- 
ances. 

3.4 Characteristic Functions 5 

You have already encountered the Moment Generating Function of a pdf in the Part IB probability 
course. This function was closely related to the Laplace Transform of the pdf. 

Now we introduce the Characteristic Function for a random variable, which is closely related to the 
Fourier Transform of the pdf. 

In the same way that Fourier Transforms allow easy manipulation of signals when they are convolved 
with linear system impulse responses, Characteristic Functions allow easy manipulation of convolved pdfs 
when they represent sums of random processes. 

The Characteristic Function of a pdf is defined as: 

$x (u) = E [e 3UX ] 

= JZ o e jux fx(x)dx (3.40) 

= ^(-u) 

where T (u) is the Fourier Transform of the pdf. 

Note that whenever fx is a valid pdf, $ (0) = / fx (x) dx = 1 

Properties of Fourier Transforms apply with — u substituted for u>. In particular: 

• Convolution - (sums of independent rv's) 

K=X>i) =>(fY = fx 1 *fx a *---*fx lf )=> (*y(u) = ft **«(«)) (3.41) 

• Inversion 

fx (x) = ±- ! e-( jux) $ x («) du (3.42) 

2,-k J 

• Moments 

(£**(«) = / {JxTe iux fx (x) dx^j =* (e [X n ] = J x n f x (x) dx = ^ £^$ x («) | u=0 ) (3-43) 



5 This content is available online at <http://cnx.Org/content/mll071/2.3/>. 
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• Scaling If Y = aX, fy (y) = from this equation in our previous discussion of functions of 

random variables, then 

t> Y (u) = Je^yf Y (y)dy 

= J e juax fx (x) dx (3.44) 

= *x {au) 

3.4.1 Characteristic Function of a Gaussian pdf 

The Gaussian or normal distribution is very important, largely because of the Central Limit Theorem 
which we shall prove below. Because of this (and as part of the proof of this theorem) we shall show here 
that a Gaussian pdf has a Gaussian characteristic function too. 
A Gaussian distribution with mean /j, and variance a 2 has pdf: 

1 <z-m) 2 

/Or) = -==e— ^ (3.45) 

V27TCH 

Its characteristic function is obtained as follows, using a trick known as completing the square of the 
exponent: 

$ x (u) = E [e jux ] 

= Je^ x f x (x)dx 

. x 2 -2^x + y. 2 -2a 2 jv,x 

= V^J e 2 ° 2 f x (3.46) 

= \^T Je "' dxje 2 « 2 

= e^e"^ 
since the integral in brackets is similar to a Gaussian pdf and integrates to unity. 

u 2 a 2 

Thus the characteristic function of a Gaussian pdf is also Gaussian in magnitude, e 2~ , with standard 
deviation -, and with a linear phase rotation term, e J " M , whose rate of rotation equals the mean /j, of the 
pdf. This coincides with standard results from Fourier analysis of Gaussian waveforms and their spectra 
(e.g. Fourier transform of a Gaussian waveform with time shift). 

3.4.2 Summation of two or more Gaussian random variables 

If two variables, X\ and X^ , with Gaussian pdfs are summed to produce X, their characteristic functions 
will be multiplied together (equivalent to convolving their pdfs) to give 

® x (u) = $ Xl H*x 2 H 

, ^W) ( 3 - 47 ) 

This is the characteristic function of a Gaussian pdf with mean ( /l*i + /12) and variance ( a\ 2 + o^ 2 )- 

Further Gaussian variables can be added and the pdf will remain Gaussian with further terms added to 
the above expressions for the combined mean and variance. 

3.4.3 Central Limit Theorem 

The central limit theorem states broadly that if a large number N of independent random variables of 
arbitrary pdf, but with equal variance a 2 and zero mean, are summed together and scaled by -i= to keep 
the total energy independent of N, then the pdf of the resulting variable will tend to a zero-mean Gaussian 
with variance a 2 as N tends to infinity. 
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This result is obvious from the previous result if the input pdfs are also Gaussian, but it is the 
fact that it applies for arbitrary input pdfs that is remarkable, and is the reason for the importance of 
the Gaussian (or normal) pdf. Noise generated in nature is nearly always the result of summing many tiny 
random processes (e.g. noise from electron energy transitions in a resistor or transistor, or from distant 
worldwide thunder storms at a radio antenna) and hence tends to a Gaussian pdf. 

Although for simplicity, we shall prove the result only for the case when all the summed processes have 
the same variance and pdfs, the central limit result is more general than this and applies in many cases 
even when the variance and pdfs are not all the same. 

3.4.3.1 Proof: 

Let Xi (i = 1 to N) be the N independent random processes, each will zero mean and variance a 2 , which 
are combined to give 

1 N 

X = ^E X * ( 3 - 48 ) 



TV' 

l= J. 

Then, if the characteristic function of each input process before scaling is $ (u) and we use (3.44) to include 
the scaling by -4= , the characteristic function of X is 

/N/ (3.49) 



**(«) = nr=i* 



Xi 



1 /JV 



log$ x (u) = TVlog$ ( -^= ) (3.50) 



Taking logs: 



Using Taylor's theorem to expand $ I -4= I in terms of its derivatives at u = (and hence its moments) 
gives 

"'{Tit =no) + 7W— + 2{7N: — + -Atn: — + ^{7n: — + - (3 - 51) 



From the Moments property of characteristic functions with zero mean: 

• valid pdf 

*(0) = E [X { °] = 1 

• zero mean 

$' (0) = jE [X % ] = 

• variance 

*(0)" = fE[X i 2 ]=-a* 

• scaled skewness 

*(0)"'=fE[X i 3 ]=-(na 3 ) 

• scaled kurtosis 

<J>(0) 4 =/£[X 4 4 ]=( K + 3)a 4 

These are all constants, independent of TV, and dependent only on the shape of the pdfs fx t - 
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Substituting these moments into (3.50) and (3.51) and using the series expansion, log (1 + x) 
(terms of order x 2 or smaller) , gives 



logt> x (u) = JVlogfc(^) 



Mog (l - fjja 2 



(3.52) 



u a 
2 



## 



where ** represents the terms of order iV i or smaller and ## represents the terms of order iV 2 or 

smaller. As N — > 00, 

2 2 

log$ x (w) -> — 

Therefore, as N — > 00 

$ x (w)^e-^ (3.53) 

Note that, if the input pdfs are symmetric, the skewness will be zero and the error terms will decay as iV -1 
rather than N~?\ and so convergence to a Gaussian characteristic function will be more rapid. 

Hence we may now infer from (3.45), (3.46) and (3.53) that the pdf of X as N — » 00 will be given by 

f x {x) = -jL= e -& (3.54) 

Thus we have proved the required central limit result. 

Figure 3.1(a) shows an example of convergence when the input pdfs are uniform, and N is gradually 
increased from 1 to 50. By N = 12, convergence is good, and this is how some 'Gaussian' random generator 
functions operate - by summing typically 12 uncorrelated random numbers with uniform pdfs. 

For some less smooth or more skewed pdfs, convergence can be slower, as shown for a highly skewed 
triangular pdf in Figure 3.1(b); and pdfs of discrete processes are particularly problematic in this respect, 
as illustrated in Figure 3.1(c). 



34 



CHAPTER 3. EXPECTATIONS, MOMENTS, AND CHARACTERISTIC 

FUNCTIONS 



(a) Uniform pdf, N = 50 

T 




-D.1 



(a) 



O.J 



(b) Triangular pdf with high skew, N = 30 

~\ 1 r 




(b) 



(c) Approximate binary pdf, N = 50 
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1.n 




N = 1 



A 



Chapter 4 

Correlation Functions and Power Spectra 



4.1 Random Processes 1 

We discussed Random Signals (Section 2.2) briefly and now we return to consider them in detail. We shall 
assume that they evolve continuously with time t, although they may equally well evolve with distance (e.g. 
a random texture in image processing) or some other parameter. 

We can imagine a generalization of our previous ideas about random experiments so that the outcome of 
an experiment can be a 'Random Object', an example of which is a signal waveform chosen at random from 
a set of possible signal waveforms, which we term an Ensemble. This ensemble of random signals is known 
as a Random Process. 



lr This content is available online at <http://cnx.Org/content/mlll00/2.6/>. 
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Ensemble ol Random Signals 
i i 

X(r,«i) ' ' 

ii 



X(l,«2) 
XtTA) 



■ m 

t L 1, t -> 

Figure 4.1: Ensemble representation of a random process. 



An example of a Random Process X (t, a) is shown in Figure 4.1, where t is time and a is an index to 
the various members of the ensemble. 

• t is assumed to belong to some set 2? (the time axis). 

• a is assumed to belong to some set srf (the sample space). 

• If ^ is a continuous set, such as R or [0, oo), then the process is termed a Continuous Time random 
process. 

• If 2? is a discrete set of time values, such as the integers Z, the process is termed a Discrete Time 
Process or Time Series. 

• The members of the ensemble can be the result of different random events, such as different instances 
of the sound 'ah' during the course of this lecture. In this case a is discrete. 

• Alternatively the ensemble members are often just different portions of a single random signal. If the 
signal is a continuous waveform, then a may also be a continuous variable, indicating the starting point 
of each ensemble waveform. 

We will often drop the explicit dependence on a for notational convenience, referring simply to random 
process {X (£)}. 
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If we consider the process {X (£)} at one particular time t = t\, then we have a random variable X (t\). 
If we consider the process {X (£)} at N time instants {t\,t 2 , • • • , ijv}, then we have a random vector: 



X 



( X(ti) X(t 2 ) ... X{t N ) ) 



We can study the properties of a random process by considering the behavior of random variables and 
random vectors extracted from the process, using the probability theory derived earlier in this course. 

4.2 Correlation and Covariance 2 

Correlation and covariance are techniques for measuring the similarity of one signal to another. For a random 
process X (t, a) they are defined as follows. 

• Auto-correlation function: 

rxx(h,t 2 ) = E[X(ti,a)X(t2,a)] 

= J J xix 2 f (x 1 ,x 2 )dxldx2 

where the expectation is performed over all a € srf (i.e. the whole ensemble), and f(xi,x 2 ) is the 
joint pdf when x\ and x 2 are samples taken at times t\ and t 2 from the same random event a of 
the random process X. 

• Auto-covariance function: 

cxx{ti,t 2 ) = E 

f I rr. _ Y i + .\ I I „•„_ Y (+„\ I f (r. ry„\ ^v1 JvO 

(4.2) 



X(ti,a)-X(ti)J \X(t 2 ,a)-X(t 2 ) 

= JjUi-X(ti)\ (x 2 -X~(t 2 yjf(xi,x 2 )dxldx2 

= r xx (ti, t 2 ) - 2 X (h)X (t 2 ) + X (h)X (t 2 ) 
= rxx{tiM)-x\tx)X~{t2) 

where the same conditions apply as for auto-correlation and the means X (t\) and X (t 2 ) are taken over 
all a g #/. Covariances are similar to correlations except that the effects of the means are removed. 

• Cross-correlation function: If we have two different processes, X (t, a) and Y (t, a), both arising as 
a result of the same random event a, then cross-correlation is defined as 

r X Y(ti,t 2 ) = E[X{t lia )Y{t 2 ,a)} 

= J J x 1 y 2 f(x 1 ,y 2 )dxldy2 

where / (x\, y 2 ) is the joint pdf when x\ and y 2 are samples of X and Y taken at times t\ and t 2 as 
a result of the same random event a. Again the expectation is performed over all a G srf . 

• Cross-covariance function: 

cx Y (ti,t 2 ) = E f* (ti,a) - X(ii) J fy(t 2 ,a)-y(t 2 ) 

= J J (x 1 -xJt 1 ))(y 2 -Y(t 2 ))f(x 1 ,y 2 )dxldy2 (4-4) 

= rxvih^h)- X{h)Y{t 2 ) 



2 This content is available online at <http://cnx.Org/content/mlll01/2.3/>. 



38 CHAPTER 4. CORRELATION FUNCTIONS AND POWER SPECTRA 

For Deterministic Random Processes which depend deterministically on the random variable a (or some 
function of it), we can simplify the above integrals by expressing the joint pdf in that space. E.g. for 
auto-correlation: 

rxx{h,t 2 ) = E[X(t u a)X(t2,a)] 

(4.5) 

= J x (ti, a) x (£2, a) f (a) da 



4.3 Stationarity 3 

Stationarity in a Random Process implies that its statistical characteristics do not change with time. 
Put another way, if one were to observe a stationary random process at some time t it would be impossible 
to distinguish the statistical characteristics at that time from those at some other time t'. 

4.3.1 Strict Sense Stationarity (SSS) 

Choose a Random Vector of length N from a Random Process: 

X=( X(h) X(t 2 ) ... X(t N ) ) T (4.6) 

Its TVth order cdf is 

Fx( tl ), ... x { t N ) (a?i, ...,x N ) = Pr [{X fa) < Xl ,...,X (t N ) < x N }} (4.7) 

X (t) is defined to be Strict Sense Stationary iff: 

^x(ti), ... x(t N ) (xi,- ■ ■ ,xn) = F X ( tl + c ), ... x(t N +c) {x\, ■ ■ ■ ,xn) (4.8) 

for all time shifts c, all finite TV and all sets of time points {t\, . . . , £jv}- 

4.3.2 Wide Sense (Weak) Stationarity (WSS) 

If we are only interested in the properties of moments up to 2nd order (mean, autocorrelation, covariance, 
...), which is the case for many practical applications, a weaker form of stationarity can be useful: 
X (i) is defined to be Wide Sense Stationary (or Weakly Stationary) iff: 

1. The mean value is independent of t, for all t 

E[X(t)]=ft (4.9) 

2. Autocorrelation depends only upon r = t 2 — t\, for all t\ 

E\X{tx)X{t2)] = E\X{tx)X{t x +T)] 

= r X x (r) 

Note that, since 2nd-order moments are defined in terms of 2nd-order probability distributions, strict sense 
stationary processes are always wide-sense stationary, but not necessarily vice versa. 



3 This content is available online at <http://cnx.Org/content/mlll02/2.3/>. 
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4.4 Ergodicity 4 

Many stationary random processes are also Ergodic. For an Ergodic Random Process we can exchange 
Ensemble Averages for Time Averages. This is equivalent to assuming that our ensemble of random 
signals is just composed of all possible time shifts of a single signal X (t). 

Recall from our previous discussion of Expectation (3.4) that the expectation of a function of a random 
variable is given by 

E[g(X)}= [ g(x)f x (x)dx (4.11) 



This result also applies if we have a random function g (.) of a deterministic variable such as t. Hence 

E[g(t)}= f g(t)f T (t)dt (4.12) 



Because t is linearly increasing, the pdf Jt (t) is uniform over our measurement interval, say — T to T, and 
will be ^ to make the pdf valid (integral = 1). Hence 



W I-t 9 (t) dt 



E[g(t)] -- J_ T g(t)^dt 



If we wish to measure over all time, then we take the limit as T — > oo. 
This leads to the following results for Ergodic WSS random processes: 

• Mean Ergodic: 

E[X(t)} = j cc oQ xf x(t) {x)dx 



limit yp I-t ^ (*) ^ 



• Correlation Ergodic: 



rxx(r) = E[X(t)X(t + r)} 

= I™ cc I-cc XlX ' 2 -fx(t),x(t+T){xi,X2)dxldx2 (4.15) 

= limit ^ J* T X (t) X (t + t) dt 

and similarly for other correlation or covariance functions. 

Ergodicity greatly simplifies the measurement of WSS processes and it is often assumed when 
estimating moments (or correlations) for such processes. 

In almost all practical situations, processes are stationary only over some limited time interval (say 
T\ to T%) rather than over all time. In that case we deliberately keep the limits of the integral finite and 
adjust fx(t) accordingly. For example the autocorrelation function is then measured using 

r xx (r) = — ^— f 2 X(t)X(t + T)dt (4.16) 

This avoids including samples of X which have incorrect statistics, but it can suffer from errors due to 
limited sample size. 



This content is available online at <http://cnx.Org/content/mlll03/2.3/>. 
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4.5 Spectral Properties of Random Signals 5 

4,5.1 Relation of Power spectral Density to ACF 

The autocorrelation function (ACF) of an ergodic random signal tells us how correlated the signal is with 
itself as a function of time shift r. In particular, for r = 

r xx (0) = limit ^_ f T x 2 (t) dt 

t^oo Z1 -'- 1 (4-17) 

= mean power of X (t) 

Note that if T — > oo, for all r 

rxx (t) = r XX (-t) < r X x (0) (4.18) 

As t becomes large, X (£) and X (t + r) will usually become decorrelated and, as long as X is zero mean, 
rxx will tend to zero. 

Hence the ACF will have its maximum at r = and decay symmetrically to zero (or to /i 2 , if /x 7^ 0) 
as \t\ increases. 

The width of the ACF (to say its half-power points) tells us how slowly X is fluctuating or how band- 
limited it is. Figure 4.2(b) shows how the ACF of a rapidly fluctuating (wide-band) random signal, as in 
Figure 4.2(a) upper plot, decays quickly to zero as \t\ increases, whereas, for a slowly fluctuating signal, as 
in Figure 4.2(a) lower plot, the ACF decays much more slowly. 



5 This content is available online at <http://cnx.Org/content/mlll04/2.4/>. 
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(a) Random signals with different bandwidths 
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The ACF measures an entirely different aspect of randomness from amplitude distributions 
such as pdf and cdf. 

As with deterministic signals, we may formalize our ideas of rates of fluctuation by transforming to the 
Frequency (Spectral) Domain using the Fourier Transform: 

T u {u>) = FT («(*)) 

The Power Spectral Density (PSD) of a random process X is defined to be the Fourier Transform of its 
ACF: 

S x (u>) = FT (r xx (r)) 

= J r xx {T)e-^r) dT 

r xx (r) = FT' 1 (S x (u>)) 

= &JS x (u)e><"du, 

N.B. {X{t)} must be at least Wide Sense Stationary (WSS). 

From (4.17) and (4.21) we see that the mean signal power is given by: 

r XX (0) = ±JS x {u)du 

= JS x (2irf)df ' 

Hence S x has units of power per Hertz. Note that we must integrate over all frequencies, both positive 
and negative, to get the correct total power. 

Figure 4.2(c) shows how the PSDs of the signals relate to the ACFs in Figure 4.2(b). 
Properties of PSDs for real- valued X (t): 

1. S x (w) = S x (-w) 

2. S x (uj) is Real-valued 

3. S x (w) > 

Properties 1 and 2 are because ACFs are real and symmetric about r = 0; and 3 is because Sx represents 
power density. 
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4.5.2 Linear system (filter) with WSS input 



Time domain 


h(t) 




X(t), txx(t) 


Linear 
System 




Sx{u) 





Frequency domain 



H(ui) 



Y(t) = h(t) * X(t), t Y y(t) 
S Y (u) = |7fHP SxH 



Figure 4.3: Block diagram of a linear system with a random input signal, X (t). 



Let the linear system with input X (t) and output Y (t) have an impulse response ft (£), so 



Y(t) = ft (t) * X (t) 

= J h (a) X (t — a) da 



Then the ACF of Y is 



E[Y{t x )Y{t 2 )] 

E[J h (ai)X (ti - ai) dal J h (a 2 ) X (t 2 - a 2 ) da2] 
E[J J ft (ai) h (a 2 ) X (ti - ai) X (t 2 - a 2 ) dalda2] 
J J h (ai) ft (a 2 ) £ [X (*! - ai) X (i 2 - a 2 )] duldal 
J J ft (ai) ft, (a 2 ) rxx (ii — ai,t 2 — a 2 ) dalda2 



If X is WSS then 



ryv (r) 



E[Y(t)Y(t + r)} 

J J h (ai) ft (a 2 ) rxx (r + ai 

rxx (r) * ft (— r) * ft (r) 



a 2 ) dalda2 



Taking Fourier transforms: 

5y (w) = 



FT (ryy(r)) 

/ / / h (ai) ft (a 2 ) r X x (r + «i - a 2 ) daYdale'^^ dr 
J J h (ai) ft (a 2 ) / r xx (r + ai - a 2 ) e- { ^drda\da2 
J J ft (ai) ft (a 2 ) / r xx (A) e -^ A - Ql+Q2 »dAdalcfa2 
/ ft (ai) e^ Ql dal / ft (a 2 ) er^ a ^da2 J r X x (A) e^^^dA 



H(uj)H(uj)Sx{uj) 



(4.23) 



(4.24) 



(4.25) 



(4.26) 
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where H (w) = FT (h (*)). i.e: 

5yH = (|HM|) 2 SxH (4.27) 

Hence the PSD of Y = the PSD of X x the power gain (\H\) of the system at frequency u>. 

Thus if a large and important system is subject to random perturbations (e.g. a power plant subject 
to random load fluctuations), we may measure rxx (t) and ryy (t), transform these to Sx (w) and SV (w), 
and hence obtain 

iwwi-^/ii (428 » 

Hence we may measure the system frequency response without taking the plant off line. But this does 
not give any information about the phase of H (u>). 

However, if instead we measure the Cross-Correlation Function (CCF) between X and Y, we get: 

rxvihM) = E\X{tx)Y{t2)] 

= E [X (ti) / h (a 2 ) X (t 2 - o 2 ) da2] 

= E[Jh{a 2 )X{t 1 )X{t 2 -a 2 )da2} (4.29) 

= Jh{a 2 )E[X{t 1 )X{t 2 -a 2 )}da2 

= J h(a 2 )r X x (h,t 2 - a 2 )da2 



If X (t), and hence Y (t), are WSS: 



r XY (r) = E[X(t)Y(t + T)} 

= J h (a) r XX (r - a) da (4.30) 

= h(r)* r X x (r) 



and taking Fourier transforms: 

SxyM = FT (rxy(r)) 
= U(uj)Sx(oj) 
where Sxy (w) is known as the Cross Spectral Density between X and Y. Therefore, 



(4.31) 



WM = ^£ (4-32) 

Hence we obtain the amplitude and phase of TL (u>). As before, this is achieved without taking the plant 
off line. 

Note that for WSS processes, rxY (t) = ryx {— t) and that (unlike rxx and ryy) these need not be 
symmetric about r = 0. Hence the cross spectral density S'xy (w) need not be purely real (unlike Sx (<*>)), 
and the phase of Sxy (w) gives the phase of W (w). 



4,5.3 Physical Interpretation of Power Spectral Density 
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Figure 4.4: Narrowband filter frequency response and PSD of filter input and output. 



Let us pass X (t) through a narrow-band filter of bandwidth 6 (ui) = 2tt6 (/), as shown in Figure 4.4: 

1 if U)q < \uj\ < UJq + S (w) 



W(w) 



(4.33) 



otherwise 

Find average power at the filter output (shaded area in Figure 4.4, divided by 2ir): 
Po 



ryy (0) 

f r . 

It: J — oo 



^- f Sy (uj) d0J 

2-k J—oo j v y 



2tt ^J- 



(u; +<5(w )) 



S x (w) dw + /; o 0+a( " o) Sx (w) dw) ~ 25 x (wo) x ^£ ( Wo ) 



since Sx (— w) = 5*x (w). Expressed in terms of /q 



P ^25x(27r/ )(J(/) 



(4.34) 



(4.35) 



with the factor of 2 appearing because our filter responds to both negative and positive frequency components 
of X. 

Hence Sx ls indeed a Power Spectral Density with units |^ (assuming unit impedance). 
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4.6 White and Coloured Processes 6 
4.6.1 White Noise 

If we have a zero-mean Wide Sense Stationary process X, it is a White Noise Process if its ACF is a 
delta function at r = 0, i.e. it is of the form: 



rxx (r) = P X S (r) (4.36) 



where Px is a constant. 

The PSD of X is then given by 



S x (lu) = J P x 5 (r) e-(^ T W 

= P x e- {ioj0) (4.37) 

= Px 

Hence X is white, since it contains equal power at all frequencies, as in white light. 
Px is the PSD of X at all frequencies. 
But: 

Power of X = ^ /^ S x M du) 



(4.38) 



3C 



so the White Noise Process is unrealizable in practice, because of its infinite bandwidth. 

However, it is very useful as a conceptual entity and as an approximation to 'nearly white' processes 
which have finite bandwidth, but which are 'white' over all frequencies of practical interest. For 'nearly 
white' processes, rxx ( T ) ls a narrow pulse of non-zero width, and Sx (w) is flat from zero up to some 
relatively high cutoff frequency and then decays to zero above that. 

4.6.2 Strict Whiteness and i.i.d. Processes 

Usually the above concept of whiteness is sufficient, but a much stronger definition is as follows: 

Pick a set of times {t\, t 2 , • • • , £jv} to sample X (t). 

If, for any choice of {ti, t 2 , ■ • ■ , *jv} with N finite, the random variables X (ti), X (£2), ■ ■ ■ X (t x ) are 
jointly independent, i.e. their joint pdf is given by 

N 

fx( tl ),x(t 2 ), ... x(t N ) (x 1 ,x 2 ,...,x N ) = Y[ fx(u) {xi) (4-39) 

and the marginal pdfs are identical, i.e. 

fx(ti) = fx(t 2 ) 



fx(t N ) 
fx 



(4.40) 



then the process is termed Independent and Identically Distributed (i.i.d). 

If, in addition, fx is a pdf with zero mean, we have a Strictly White Noise Process. 
An i.i.d. process is 'white' because the variables X (ti) and X (tj) are jointly independent, even when 
separated by an infinitesimally small interval between ti and tj. 



6 This content is available online at <http://cnx.Org/content/mlll05/2.4/>. 



47 

4.6.3 Additive White Gaussian Noise (AWGN) 

In many systems the concept of Additive White Gaussian Noise (AWGN) is used. This simply means 
a process which has a Gaussian pdf, a white PSD, and is linearly added to whatever signal we are analysing. 

Note that although 'white' and Gaussian' often go together, this is not necessary (especially for 'nearly 
white' processes). 

E.g. a very high speed random bit stream has an ACF which is approximately a delta function, and 
hence is a nearly white process, but its pdf is clearly not Gaussian - it is a pair of delta functions at + (V) 
and — V, the two voltage levels of the bit stream. 

Conversely a nearly white Gaussian process which has been passed through a lowpass filter (see next 
section) will still have a Gaussian pdf (as it is a summation of Gaussians) but will no longer be white. 

4.6.4 Coloured Processes 

A random process whose PSD is not white or nearly white, is often known as a coloured noise process. 

We may obtain coloured noise Y (t) with PSD Sy (w) simply by passing white (or nearly white) noise 
X (t) with PSD Px through a filter with frequency response H (to), such that from this equation (4.27) from 
our discussion of Spectral Properties of Random Signals. 

s Y (u) = SxM(I^MI) 2 (441) 

= Pxdn^)]) 2 

Hence if we design the filter such that 



then Y (t) will have the required coloured PSD. 

For this to work, Sy (u>) need only be constant (white) over the passband of the filter, so a nearly white 
process which satisfies this criterion is quite satisfactory and realizable. 

Using this equation (4.25) from our discussion of Spectral Properties of Random Signals and (4.36), the 
ACF of the coloured noise is given by 

ryy (t) = rxx (t) * h (— t) * h (r) 

= P x S{t) *h(-T)*h{T) (4.43) 

= P x h{-T)*h{T) 

where h (t) is the impulse response of the filter. 

This Figure (Figure 4.2) from previous discussion shows two examples of coloured noise, although the 
upper waveform is more 'nearly white' than the lower one, as can be seen in part c of this figure (Figure 4.2(c)) 
from previous discussion in which the upper PSD is flatter than the lower PSD. In these cases, the coloured 
waveforms were produced by passing uncorrelated random noise samples (white up to half the sampling 
frequency) through half-sine filters (as in this equation (2.7) from our discussion of Random Signals) of 
length Tb = 10 and 50 samples respectively. 
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